Tiling and Scheduling of Three-level Perfectly Nested Loops with Dependencies on Heterogeneous Systems
نویسندگان
چکیده
Nested loops are one of the most time-consuming parts and the largest sources of parallelism in many scientific applications. In this paper, we address the problem of 3-dimensional tiling and scheduling of three-level perfectly nested loops with dependencies on heterogeneous systems. To exploit the parallelism, we tile and schedule nested loops with dependencies by awareness of computational power of the processing nodes and execute them in pipeline mode. The tile size plays an important role to improve the parallel execution time of nested loops. We develop and evaluate a theoretical model to estimate the parallel execution time of tilled nested loops. Also, we propose a tiling genetic algorithm that used the proposed model to find the nearoptimal tile size, minimizing the parallel execution time of dependence nested loops. We demonstrate the accuracy of theoretical model and effectiveness of the proposed tiling genetic algorithm by several experiments on heterogeneous systems. The 3D tiling reduces the parallel execution time by a factor of 1.2× to 2× over the 2D tiling, while parallelizing 3D heat equation as a benchmark.
منابع مشابه
On Parameterized Tiled Loop Generation and Its Parallelization
Tiling is a loop transformation that decomposes computations into a set of smaller computation blocks. The transformation has proved to be useful for many high-level program optimizations, such as data locality optimization and exploiting coarse-grained parallelism, and crucial for architecture with limited resources, such as embedded systems, GPUs, and the Cell. Data locality and parallelism w...
متن کاملDynamic Scheduling for Dependence Loops on Heterogeneous Clusters
Distributed computing systems are a viable and less expensive alternative to parallel computers. However, concurrent programming methods in distributed systems have not been studied as extensively as for parallel computers. In the past, a variety of dynamic scheduling schemes suitable for loops with independent iterations on heterogeneous computer clusters have been obtained and studied. Howeve...
متن کاملPrimeTile: A Parametric Multi-Level Tiler for Imperfect Loop Nests
Tiling is a crucial loop transformation for generating high performance code on modern architectures. Efficient generation of multi-level tiled code is essential for maximizing data reuse in systems with deep memory hierarchies. Tiled loops with parametric tile sizes (not compile-time constants) facilitate runtime feedback and dynamic optimizations used in iterative compilation and automatic tu...
متن کاملA two-level scheduling method: an effective parallelizing technique for uniform nested loops on a DSP multiprocessor
A digital signal processor (DSP), which is a special-purpose microprocessor, is designed to achieve higher performance on DSP applications. Because most DSP applications contain many nested loops and permit a very high degree of parallelism, the DSP multiprocessor has a suitable architecture to execute these applications. Unfortunately, conventional scheduling methods used on DSP multiprocessor...
متن کاملGraph Transformation for Communication Minimization Using Retiming
Nested loops are normally the most time intensive tasks in computer algorithms. These loops often include multiple dependencies between arrays that impose communication constraints when used in multiprocessor systems. These dependencies may be between dependent arrays (loop dependencies), or between independent arrays (data dependencies). In this paper, reducing the communication caused by data...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Scalable Computing: Practice and Experience
دوره 17 شماره
صفحات -
تاریخ انتشار 2016